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Abstract. Wc consider a distributed multi-agent network system where the goal is to minimize 
a sum of convex objective functions of the agents subject to a common convex constraint set. Each 
agent maintains an iterate sequence and communicates the iterates to its neighbors. Then, each agent 
combines weighted averages of the received iterates with its own iterate, and adjusts the iterate by 
using subgradient information (known with stochastic errors) of its own function and by projecting 
onto the constraint set. 

The goal of this paper is to explore the effects of stochastic subgradient errors on the convergence 
of the algorithm. We first consider the behavior of the algorithm in mean, and then the convergence 
with probability 1 and in mean square. We consider general stochastic errors that have uniformly 
bounded second moments and obtain bounds on the limiting performance of the algorithm in mean 
for diminishing and non-diminishing stepsizes. When the means of the errors diminish, wc prove 
that there is mean consensus between the agents and mean convergence to the optimum function 
value for diminishing stepsizes. When the mean errors diminish sufficiently fast, we strengthen the 
results to consensus and convergence of the iterates to an optimal solution with probability 1 and in 
mean square. 
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1. Introduction. A number of problems that arise in the context of wired and 
wireless networks can be posed as the minimization of a sum of functions, when each 
component function is available only to a specific agent [23,25,26]. Often, it is not 
efficient, or not possible, for the network agents to share their objective functions with 
each other or with a central coordinator. In such scenarios, distributed algorithms 
that only require the agents to locally exchange limited and high level information 
are preferable. For example, in a large wireless network, energy is a scarce resource 
and it might not be efficient for a central coordinator to learn the individual objective 
functions from each and every agent [23]. In a network of databases from which 
information is to be mined, privacy considerations may not allow the sharing of the 
objective functions [34]. In a distributed network on a single chip, for the chip to 
be fault tolerant, it is desirable to perform the processing in a distributed manner to 
account for the statistical process variations [32]. 

We consider constrained minimization of a sum of convex functions, where each 
component function is known partially (with stochastic errors) to a specific network 
agent. The algorithm proposed builds on the distributed algorithm proposed in [19] 
for the unconstrained minimization problem. Each agent maintains an iterate se- 
quence and communicates the iterates to its neighbors. Then, each agent averages 
the received iterates with its own iterate, and adjusts the iterate by using subgradi- 
ent information (known with stochastic errors) of its own function and by projecting 
onto the constraint set. The inter-agent information exchange model is a synchronous 
and delayless version of the computational model proposed by Tsitsiklis [30]. The 



*The first and the third authors are with the Electrical and Computer Engineering Department 
at the University of Illinois at Urbana-Champaign. The second author is with the Industrial and 
Enterprise Systems Engineering Department at University of Illinois at Urbana-Champaign. They 
can be contacted at {ssriniv5,angelia,vvv}@illinois.edu. This work has been supported by NSF 
Career Grant CMMI 07-42538. 



1 



2 



S. Sundhar Ram, A. Ncdic and V. V. Veoravalli 



algorithm is distributed since there is no central coordinator. The algorithm is local 
since each agent uses only locally available information (its objective function) and 
communicates locally with its immediate neighbors. 

Related to this work are the distributed incremental algorithms, where the net- 
work agents sequentially update an iterate sequence in a cyclic or a random or- 
der [5, 12, 16, 23, 25]. The effects of stochastic errors on these algorithms have been 
investigated in [3,9,14,17,23,25,28]. In an incremental algorithm, there is a single 
iterate sequence and only one agent updates the iterate at a given time. Thus, while 
being distributed and local, incremental algorithms differ fundamentally from the al- 
gorithm studied in this paper (where all agents update simultaneously). Also related 
are the optimization algorithms in [2,31]. However, these algorithms are not local as 
the complete objective function information is available to each and every agent, with 
the aim of distributing the processing. 

The work in this paper is also related at a much broader level to the distributed 
consensus algorithms [2,11,13,15,18,20,21,29-31,33]. In these algorithms, each agent 
starts with a different value and through local information exchange, the agents even- 
tually agree on a common value. The effect of random errors on consensus algorithms 
have been investigated in [10,13,15,33]. In addition, since we are interested in the 
effect of stochastic errors, our paper is also related to the literature on stochastic 
subgradient methods [6-8]. 

We consider general stochastic errors that have uniformly bounded second mo- 
ments and obtain bounds on the limiting performance of the algorithm in mean for 
diminishing and non-diminishing stepsizes. When the means of the errors diminish, 
we prove that there is mean consensus between the agents and mean convergence to 
the optimum function value for diminishing stepsizes. When the mean errors diminish 
sufficiently fast, we strengthen the results to consensus and convergence of the iterates 
to an optimal solution with probability 1 and in mean square. 

Our work expands the multi-agent distributed optimization framework studied 
in [19]. The new contributions are: 1) the study of the effects of stochastic errors 
in subgradient evaluations; 2) the consideration of constrained optimization problem 
within the distributed multi-agent setting. The presence of the constraint set com- 
plicates the analysis as it introduces non-linearities in the system dynamics. The 
non-linearity issues that we face have some similarities to those in the constrained 
consensus problem investigated in [20] , though the problems arc fundamentally differ- 
ent. The presence of subgradient stochastic errors adds another layer of complexity 
to the analysis as the errors made by each agent propagate through the network to 
every other agent and also across time, making the iterates statistically dependent 
across time and agents. 

The rest of the paper is organized as follows. In Section [2j we formulate the 
problem, describe the algorithm and state our basic assumptions. In Section [3] we 
state some results from literature that we use in the analysis, while in Section 31 we 
derive two important lemmas that form the backbone of the analysis. In Section[51 we 
study the convergence properties of the method in mean, and in Section [5] we focus on 
the convergence properties with probability 1 and in mean square. Finally, we discuss 
some implications and provide some concluding remarks in Sections [7] and [8] 

2. Problem, algorithm and assumptions. In this section, we formulate the 
problem of interest and describe the algorithm that we propose. We also state and 
discuss our assumptions on the agent connectivity and information exchange. 
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2.1. Problem. We consider a network of m agents that are indexed by 1, ... ,m. 
Often, when convenient, we index the agents by using set V ~ {1, ...,m}. The 
network objective is to solve the following constrained optimization problem: 

m 

minimize fi{x) 

i=l 

subject to x £ X, (2.1) 

where X C !ft™ is a constraint set and fi : X — * 5ft for all i. Related to the problem 
we use the following notation 

m 

/(*) = £/*(*), f*=minf(x), X* = {x £ X : fix) = /*}. 

i=l 

We are interested in the case when the problem in (|2.1[) is convex. Specifically, 
we assume that the following assumption holds. 

Assumption 1 . The functions fi and the set X are such that 

(a) The set X is closed and convex. 

(b) The functions fi, i G V are defined and convex over an open set that contains 
the set X . 

The function fi is known only partially to agent i in the sense that the agent can 
only obtain a noisy estimate of the function subgradicnt. The goal is to solve problem 
(|2.ip using an algorithm that is distributed and localQ 

We make no assumption on the differentiability of the functions fi. At points 
where the gradient does not exist, we use the notion of subgradients. A vector V/i is 
a subgradicnt of fi at a point x € dom f if the following relation holds 

^Mx) T (y-x)<f l (y)-f l (x) for all y € dom/. (2.2) 

Since the set X is contained in an open set over which the functions are defined and 
convex, a subgradient of fi exists at any point of the set X (see [1] or [27]). 

2.2. Algorithm. To solve the problem in (|2.ip with its inherent decentralized 
information access, we consider an iterative subgradient method. The iterations are 
distributed accordingly among the agents, whereby each agent i is minimizing its con- 
vex objective fi over the set X and locally exchanging the iterates with its neighbors. 

Let Wi^k be the iterate with agent i at the end of iteration k. At the beginning of 
iteration k + 1, agent i receives the current iterate of a subset of the agents. Then, 
agent i computes a weighted average of these iterates and adjusts this average along 
the negative subgradient direction of fi, which is computed with stochastic errors. 
The adjusted iterate is then projected onto the constraint set X. Mathematically, 
each agent i generates its iterate sequence {wi k} according to the following relation: 

w ltk+1 = P x [v itk - ctk+i (V/i («j,fc) + £i,fc+i)] , (2.3) 

starting with some initial iterate Wifl 6 X. Here, V/i (w^fc) denotes the subgradient of 
fi at Vi.k and ti.k+i is the stochastic error in the subgradient evaluation. The scalar 
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ctk+i > is the stcpsize and Px denotes the Euclidean projection onto the set X. 
The vector Uj & is the weighted average computed by agent i and is given by 

Vi,k = 2J a hj( k + l) w j,k, ( 2 -4) 

j£iVi(fc+l) 

where Ni(k + 1) denotes the set of agents whose current iterates are available to agent 
i in the (k + l)-st iteration. We assume that i e Ni(k + 1) for all agents and at all 
times fc. The scalars dij(k + 1) are the non-negative weights that agent i assigns to 
agent j's iterate. We will find it convenient to define aij(k + 1) as for j ^ Ni(k + 1) 
and rewrite (|2.4[) as 

m 

Vj,k = ^ ai,j(k + l)t"j,fc- (2-5) 
i=i 

This is a "consensus" -based step ensuring that, in a long run, the information of each 
fi reaches every agent with the same frequency, directly or through a sequence of local 
communications. Due to this, the iterates Wj t k become eventually "the same" for all 
j and for large enough k. The update step in (|2.3[) is just a subgradient iteration for 
minimizing /j over X taken after the "consensus" -based step. 

2.3. Additional assumptions. In addition to Assumption [TJ we make some 
assumptions on the inter-agent exchange model and the weights. The first assump- 
tion requires the agents to communicate sufficiently often so that all the component 
functions, directly or indirectly, influence the iterate sequence of any agent. Recall 
that we defined Ni(k + 1) as the set of agents that agent i communicates with in 
iteration k + 1. Define (V,Ek+i) to be the graph with edges 

E k+1 ={(j,i):jeNi(k + l),ieV}. 

Assumption 2. There exists a scalar Q such that the graph (V,Ui—i qE^+i) is 

strongly connected for all k. 

It is also important that the influence of the functions fi is "equal" in a long 
run so that the sum of the component functions is minimized rather than a weighted 
sum of them. The influence of a component fj on the iterates of agent i depends 
on the weights that agent i uses. To ensure equal influence, we make the following 
assumption on the weights. 

Assumption 3. For i e V and all k, 

(a) ttj j(k + 1) > 0, and a t j(k + 1) = when j £ N % (k + 1 ), 

(b) E7=l*id(k + l) = l, ' J 

(c) There exists a scalar r/, < 77 < 1, such that a% j(k+l) > r/ when j G Ni(k+1), 

(d) ET-iMHi) = i. 

Assumptions [3^ and [3Jd state that each agent calculates a weighted average of 
all the iterates it has access to. Assumption [5J; ensures that each agent gives a 
sufficient weight to its current iterate and all the iterates it receives!! Assumption [3Ji, 
together with Assumption^ as we will see later, ensures that all the agents are equally 
influential in the long run. In other words, Assumption [Sp is crucial to ensure that 
S"=i fi is minimized as opposed to a weighted sum of the functions fi with non-equal 
weights. To satisfy Assumption [3Ji, the agents need to coordinate their weights. Some 
coordination schemes are discussed in [19,25]. 



2 The agents need not be aware of the common bound r). 
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3. Preliminaries. In this section, we state some results for future reference. 
3.1. Euclidean norm inequalities. For any vectors vi,..., vm £ 5?™, we have 

2 



M 

E 



i M 
«*-]jfX> 



3 = 1 



M 
i=l 



for any x £ 3?™. 



(3.1) 



The preceding relation states that the average of a finite set of vectors minimizes the 
sum of distances between each vector and any vector in 3?™, which can be verified 
using the first-order optimality conditions. 

Both the Euclidean norm and its square are convex functions, i.e., for any vectors 



V\, . . . , Vm £ Sft™ and nonnegative scalars 0%, . . . , /?m such that Ei=i 0% = 1j we have 



M 



M 



M 

<I>Ni' 

i=l 
A/ 

<5>NI 2 - 

i=i 



(3.2) 



(3.3) 



The following inequality is the well-knowij^ non- expansive property of the Euclidean 
projection onto a nonempty, closed and convex set X, 



\\Px[x] - P x [y]\\ < \\x-y\\ for aXlx,y£ 8*™. 



(3.4) 



3.2. Scalar sequences. For a scalar j3 and a scalar sequence {7^}, we consider 

the "convolution" sequence J2e=o P k ~ t lt- = P k l® + P k ~ l ^i + • •• + /?7fc-i + 7k- We 
have the following result. 

Lemma 3.1. Let {7^} &e a scalar sequence. 

(a) 7/limfc^oo 7fc = 7 and < /3 < 1, £/ien lim fe _ >00 ELo P k ~ tr it = r=£ ■ 

(b) If 7k > for all k, E fc 7ft < 00 andO < p < 1, tfierc Er=o (ELo /3 fc ~'7f ) < °° 
("cj 7/limsupj.^^ 7^ = 7 and {Cfc} is a positive scalar sequence with Efc=i Cfe = °°> 

then limsupj^^ ' < 7. In addition, if liminffc^oo jk = 7, then 

nmjf— >od — 7 — — 7- 

-=v/s = S/s 

Proof, (a) Let e > be arbitrary. Since jk —> 7 and for all fc, there is an index K 
such that |7fe — 7I < e for all k > K. For all k > K + 1, we have 



A 



E ^-S<ma < x v 7 t E/ 3fe "' + (7 + ^) E /> 

=A+1 t=0 £=K+1 



k-e 



Since ELa+i P k 1 < 1=3 and 



A 



E^~ 

<?=0 



■/3 



fc-A 



/3 fc - A (l + ---+/3 A ) < 



ik-K 



1-/3' 



3 See for example [1], Proposition 2.2.1. 
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it follows that for all k > K + 1. 



v (3 k ~ K 7 + e 



Therefore, 

k 



E(3 k £ j e < ( max j t , 
\o<t<K ' J 1-/3 1-/3 



limsup (3 k < 



7 + e 



0<t<K 

1=0 e=o e=K+i 



K k 

k-l 



k^oo e=Q 1 P 

Since e is arbitrary, we conclude that limsupj.^^ Yle=o P k ~ l lt. -\ T3/r 
Similarly, we have 

£^-S> nS m 7t£ ^ + (7-e) £ ^ 

Thus, 

liminf V /3 fe ~S > liminf ( min j t V /3 fc ~ £ + (7 - e) V /3 

fe^oo ^ k^oo \0<t<K ^ ' 

Since ££ > /3 fe - K and Y,t=K+i ^ = Et=o^ +1) which tends to 1/(1— j^) 
as — ► 00, it follows that 

k , . fe-(JT+X) 

liminf V/3 fc "S > ( min 7t ) lim B k ~ K + (7-e) lim V /3 s = 2_Z±. 

£=0 v 7 s=0 ^ 

Since e is arbitrary, we have liminf k^oo X)f=o fl k ~ f 'le > wr This and the relation 
limsupfc^^ ^=0 P k ~ l 1i < T^g, imply 

1=0 

(b) Let 2fc°=o 7fc < °°- For any integer M > 1, we have 

fc=0 \^=0 / C=0 t=0 £=0 y 

implying that 

00 / k \ oo 

E E^ ^^ 7(<W - 

fe=o \e=o / H e=o 

(c) Since limsup^^ 7^ = 7, for every e > there is a large enough K such that 
7fc < 7 + e for all fc > A'. Thus, for any M > K, 

SfcLo7fcCfc = Ef= 7fcCfc Efcljf+i 7fcCfc Zf^o7fcCfc / v Et=g+iCfc 

2^k=0^ k 2jfc=0^ l^k=0^ k 2^k=0^ k Z^fe=0^ fe 
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By letting M — > oo and using J^ fe = oo, we see that limsup^^,^ ^ij" \ k < 7 + e, 
and since e is arbitrary, the result for the limit superior follows. 

Analogously, if liminffc^oo Ik = 7, then for every e > there is a large enough K 
such that 7/t > 7 — e for all k > K. Thus, for any M > K, 



Z^fc=0 Sfc 



> 



Letting M — > oo and using ^2 k Ck = oo, we obtain liminf m->oo 

Since e > is arbitrary, we have liminfjvf->oo vHr t — 7- This relation and the 

relation for the limit superior yield limj\/^oG * = 7 when 7^ — > 7. □ 

3.3. Matrix convergence. Let A(k) be the matrix with (i, j)-th entry equal to 
dij(k). As a consequence of Assumptions [3£i, [3)3 and[3ji, the matrix A{k) is doubly 
stochastic^. Define, for all k, s with k > s, 

$(fc,s) = A(fc)A(fc-l)---A(s + l). (3.5) 

We next state a result from [18] (Corollary 1) on the convergence properties of the 
matrix <&(fc, s). Let [$(fc, s)]jj denote the (i,j)-th entry of the matrix $(fc, s), and let 
e G 5R m be the column vector with all entries equal to 1. 
Lemma 3.2. Let Assumptions^ and\^ hold. Then 
1. lim fc _ >0o $(fc, s) = ^ ee T /or a// s. 

Further, the convergence is geometric and the rate of convergence is given by 



[*(M)k 



1 

m 



< 0/3 



k — s 



where 



9=1- 



Am 2 



(3 = 1- 



4m 2 



3.4. Stochastic convergence. We next state some results that deal with the 
convergence of a sequence of random vectors. The first result is the well known Fatou's 
lemma [4]. 

Lemma 3.3. Let {X{\ be a sequence of non-negative random variables. Then 



liminf X r 



< liminf E[X n ] 



The next result is due to Robbins and Siegmund (Lemma 11, Chapter 2.2, [22]). 

Theorem 3.4. Let {B k }, {L>k}, and {H k } be non-negative random sequences 
and let {Cfc} be a deterministic nonnegative scalar sequence. Let G k be the a— algebra 
generated by Bx, . . . , B k , D\, . . . , D k , Hi, . . . , Hk- Suppose that < oo, 

E[B fc+1 | G k ] < (1 + Ck)B k -D k + H k for all k, (3.6) 

and ^2 k H k < oo with probability 1. Then, the sequence {B k } converges to a non- 
negative random variable and D k < oo with probability 1, and in mean. 



4 The sum of its entries in every row and in every column is equal to 1. 
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4. Basic relations. In this section, wc derive two basic relations that form the 
basis for the analysis in this paper. The first of them deals with the disagreements 
among the agents, and the second deals with the agent iterate sequences. 

4.1. Disagreement Estimate. The agent disagreements are typically thought 
of as the norms \\wi^k — Wj^\\ of the differences between the iterates Wi >k and Wj ik 
generated by different agents according to (|2.3[) - (|2.4p . Alternatively, the agent dis- 
agreements can be measured with respect to a reference sequence, which we adopt 
here. In particular, we study the behavior of \\y k — Wi,k\\, where {yk} is the auxiliary 
vector sequence defined by 

_^ m 

Vk = — m,k for all k. (4.1) 

TO * — ' 

i=l 

In the next lemma, we provide a basic estimate for Wj,k\\ ■ The rate of convergence 
result from Lemma 13.21 plays a crucial role in obtaining this estimate. 

Lemma 4.1. Let Assumptions [TJa, [H and\3\hold. Assume that the subgradients 
of fi are uniformly bounded over the set X, i.e., there are scalars Ci such that 

||V/»(a;)|| < Q for all x e X and all i e V. 

Then, for all j £ V and k > 0, 

k rn 

\\y k+1 -w jlk+1 \\ <77^/3 fc+1 max|j^ || +^a f /3 fe+1 -^(C i + ||e M |j) 

e=i i=i 



TO 

i—1 



J2(Ci + IKfc+iH) + a k+1 (Cj + ||e,, fe+ i||). 



Proof. Define for all i £ V and all k, 

m 

Pi.k+i = WiM+i - /~]<H,j(k + l)tfj,fc. (4.2) 

3=1 

Using the matrices $(fc, s) defined in (|3.5|) we can write 

m k / m \ 

Wj, k +1 =Y,M k + 1 >°)hi w i,0+Pj,k+l+Y^ ■ ( 4 - 3 ) 

i=l l=\ Vi=l / 

Using (|4.2p . we can also rewrite y k , defined in (|4.1[) . as follows 



2/fc+l 



^ / m rn rn \ 

m \*=1 j=l i=l / 



Wj.fc + y Pt,fc+i 

In the view of the doubly stochasticity of the weights, we have Y^hLi a i.j(k + 1) = 1, 
implying that 

i / m m \ i m 

yk+i = — y ":,./. • V/'-./. • : =^feH — y"z>i,fc+i- 

TO I ^— ' ^— ' / TO ^— ' 

\i=l »=1 / i=l 



Distributed Stochastic Subgradicnt Algorithms for Convex Optimization 



9 



Therefore 



fc+l m 



k-\-l m 



fly |^ lib -j^ lit -j^ n,^_L Ill, 

yk+i = ya + — e = — e h — e e*v- 

1=1 i=l i=l f=l i=l 

Substituting for j/fe+i from (|4.4[) and for Wj t k+i from (|4.3| . we obtain 

fc + l m 



(4.4) 



\\yk+i - Wj,k+i\\ 



— h — y~] pm 

111. — ^ 771 z — ' z — ' 

m km 

i=i £=1 1=1 

g(l _[*(fc + 1) 0)]i,,W 

i=l ^ m ' 

fcm /l \ ( 1 m 

+ E E - - + p m + - E p*.»+i - p**+i 



||Wi,o|| 



Therefore, for all j € V and all fc, 

m 

||2/fe+i - Wj.H-i II < E + 



km. 1 m, 

+ EE --[*(fc+u)k< m + -£iik*+i|i + iip* 



fc + l 



We can bound ||wi,o|| < maxjgy ||ifi,o || - Further, we can use the rate of conver- 
gence result from Lemma 15721 to bound — — [<&(k, ■ We obtain 



Iffc+i - ^,fe+i!l <m6p k+1 max|K, || + #£ P k+1 ^ £ IK< 

_^ m 

+ — £ llRfc+lll + \\Pj,k+i\\ ■ 



(4.5) 



i=i 



We next estimate the norms of the vectors ||pifc|| for any k. From the definition 
of fc+i in (|4.2j) and the definition of the vector Vi.k in (|2.4|) . we have Pi.k+i = 
u>i t k+i —Vi,k- Note that, being a convex combination of vectors Wjk in the convex set 
X, the vector v^k is in the set X. By the definition of the iterate itts,fc+i in (|2.3p and 
the non-expansive property of the Euclidean projection in (|3.4|) . we have 

lbi,fc+i|| = ll-Px- [vi,k - "fc+l (V/i(«i,fc) + e;,fc+i)] - 

< dfc+i ||V/i(ui jfe ) + e^fc+iH 

< dfc+i (Cj + ||e;.fc +1 ||) . 

In the last step we have used the subgradient boundedness. By substituting the 
preceding relation in (|4.5[) . we obtain the desired relation. □ 

4.2. Iterate Relation. Here, we derive a relation for the distances ||i>i,fc+i — z\\ 
and the function value differences f(yk) — f(z) for an arbitrary z € X. This relation 
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together with Lemma |4 . 1 1 provides the basis for our subsequent convergence analysis. 
In what follows, recall that / = Y^i=i fi- 

Lemma 4.2. Let Assumptions\]\\^ and\3\hold. Assume that the subgradients of 
fi are uniformly bounded over the set X, i.e., there are scalars Ci such that 

||V/;(x)|| < Ci for all x £ X and all i £ V. 

Then, for any z £ X and all k, 

m m 

E ik*+i - z \\ 2 ^ E - z \\ 2 - 2ak +i (/(w) - /(*)) 

i=l i=l 

+2a k +i ( max Ci I V" \\y k - w jk \\ 
\ iev I * — ' 

v / j=l 

m m 

-2a k+ i E <£k+iK* ~ z) + a 2 k+1 E (Ci + Ikfc+ill) 2 ■ 

i=l i=l 

Proof. Using the Euclidean projection property in (|3.4p . from the definition of 
the iterate w^k+i in (|2.3p . we have for any z £ X and all k, 



\\wi,k+i - A? = \\ p x [vi,k - afc+i (V/i(«i,fc) + £i,fc+i)] - z\\ 2 

<\\vi, k - z\\ 2 - 2a fe+ iV/ i (w i! fc) T (wj.fc - z) - 2a k+ ie[ k+1 (v lik - z) 
+ a l+i ||V/i(t)i,fc) + ej,fc+i|| • 

By using the subgradient inequality in (|2.2[) to bound the second term, we obtain 

||Wi,fc+i - z|| 2 <||wi.fe - z\\ 2 - 2a k+ i (fi{v i<k ) - fi{z)) 

- 2a k+1 el k+1 {v lik - z) + a 2 k+1 \\ Vfi(v i)k ) + e i)k+1 \\ 2 . (4.6) 

Note that by the convexity of the squared norm [cf. Eq. (|3.3[) ] , we have 

2 



E a »,i(^ + 2)w,-,fc+i - z 

3 = 1 



EiK fc+ i--n 2 = E 

i=l i=l 

In view of Assumption O we have X)i=i a i,j(k + 2) = 1 for all j and fc, implying that 



< ^2^2a h] (k + 2)\\ Wj , k+1 - z\\ 

i=l j=l 



E IKk+1 - z\\ 2 < E IK',fc+l - z\\ 2 . 
1=1 3=1 

By summing the relations in (|4. 6[) over all i £ V and by using the preceding 
relation, we obtain 



E ikfc+i - z ii 2 - E \\ Ui > k _ z ii 2 ~ 2a k+i E (M v ^,k) - fi(z)) 

i=l j=l 

m m 

-2a fc+ i E ^fc+iK* ~ z ) + a fe+i E ll V /*Kfc) + ei,fe+i|| 2 -(4.7) 
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From (|2.2[) we have 

fi(vi,k) ~ fi{z) > (fi(vi, k ) - My k )) + (My k ) - U{z)) 

> - W^hMWhk - vi, k \\ + (My k ) - fi{z)) . 



(4.8) 



Recall that Vi t k = 2j=i a i,j(^ + ^) w j,k [cf- (|2.5p j. Substituting for v^k and using the 
convexity of the norm [cf. (|3.2|) ]. from (|4.8[) we obtain 

tci m 

E fiM - fi(z) > - E llv/iK^lllly*: - w ilfc || + (f(y k ) - /(«)) 



i=l 



> 



£l|v/^ 



i=l 



Vk - ^ "/',.,!/■• + l)w 3 ,fc 



(/« - /(*)) 



> - E llv/iK fe )ll E a *^ k + ^llf* - + (/(»*) ~ /W) 



i=l 



m / m 



>- [™x\\vfi(vi,k)\\J E [Yl a ^( k + 1 )J \\yk-Wj,k\\ 
+ iKvk) - f{z)) 

= - (max||V/ 2 K fe )||) £> fc - w jtk \\ + (f(y k ) - f(z)) . 
By using the preceding estimate in relation (|4.7[) . we have 

m tci 

E Kh-i - z ii 2 ^ E n^. fc - z ii 2 - 2afc +i (/(»*) - /(*)) 

i=l i=l 

+ 2a fc+ i f max||V/i(«i,fc)|| J E " yfe ~ 

m m 

- 2a fc+ i E ^k+iK* - z) + afe + i E HWiOi.fc) + e i,fe+i|| 2 ■ 

i=l i=l 

The result follows by using the subgradient norm boundedness, ||V/i(i>i,fe)|| < Ci for 
all k and i. □ 

5. Convergence in mean. Here, we study the behavior of the iterates gener- 
ated by the algorithm, under the assumption that the errors have bounded norms in 
mean square. In particular, we assume the following. 

Assumption 4. The subgradient errors are uniformly bounded in mean square, 
i.e, there are scalars Vi such that 

E[||ei ! fc + i|| 2 ] < vf for all i € V and all k. 

Using this assumption, we provide a bound on the expected disagreement E^lra^fe — yk\\] 
for nondiminishing stepsize. We later use this bound to provide an estimate for the 
algorithm's performance in mean. The bound is provided in the following theorem. 

Theorem 5.1. Let Assumptions[l)a, [H [5| and[^] hold. Also, let the subgradients 
of each fi be uniformly bounded over X , i.e., for each i £ V there is Ci such that 

||V/i(x)|| < Q for allxe X. 
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If the stepsize {a k } is such that lirm-^oo a k — a for some a > 0, then for all j € V , 

( m6[3 \ 

limsupE[||y fc+ i - Wj. k+1 \\] < amax{C, + v % } 2 + - ■ 

Proof. The conditions of Lemma |4~T1 arc satisfied. Taking the expectation in the 
relation of Lemma l4.1l and using the inequality E[|| ei,fc ||] < \/E[||et.fc|| 2 ] = T>i, we obtain 
for all j £ V and all k, 



E[\\y k+ i - w jtk+ i\\] <m0/3 k+1 m&x\\w itO \\ + m0/3max{Ci + v,} V (3 k e a e 



+ 2a fe+ i max{Cj + n}. (5.1) 

Since limfc_, 00 «fc = ct, by Lemma I3~l7 a) we have limfc_ 00 Yli=i P k ~ l cti = YZg- Using 
this relation and lim^oo = a, we obtain the result by taking the limit superior in 
(f5Tj) as k -> oo. □ 

When the stepsize is diminishing (i.e., a = 0), the result of Theorem 15.11 implies 
that the expected disagreements E[||yfc+i — to^fc+iH] converge to for all j. Thus, 
there is an asymptotic consensus in mean. We formally state this as a corollary. 

Corollary 5.2. Let the conditions of Theorem 15.11 hold with a = 0. Then 
linife^oo E[\\w Jtk - y k \\] = for all j e V. 

We next obtain bounds on the performance of the algorithm. We make the 
additional assumption that the set X is bounded. Thus, the subgradients of each fi 
are also bounded (see [1], Proposition 4.2.3). 

Note that, under Assumption HJ by Jensen's inequality we have HE^k+i] || < vi. 
Therefore, under Assumption 01 

limsup 1 1 E [ei.fc+i] || < Vi for all i £ V. (5-2) 

k — >oc 

We have used this relation in our analysis of the agent disagreements in Theorem 15. II 
Using this relation, we obtain special results for the cases when the errors are zero 
mean or when their mean is diminishing, i.e., the cases Efe^fc+i] = for all i,k, or 
limsup^^ ||E[e ijfc+ i] || = for all i. 

Theorem 5.3. Let Assumptions [7J [H [3| and^hold. Assume that the set X is 
bounded. Let limfe—^ a k = a with a > 0. If a = 0, also assume that a k = oo. 
Then, for all j € V , 



liminf E[f(w jtk )] < f* + max \\x - y\\ V] /Xj + ma ( max{Ci + i>i} 



k^oo x,y£X i \ieV J \2 1 — 



9 2m0(3 



where p,i = limsup^^^ || Efe^fe+i] || and Ci is an upper-bound on the subgradient norms 
of fi over the set X. 

Proof. Under Assumption fJJ the limit superiors /2j = limsupj._ >o0 HE^fc+i] || are 
finite [cf. Eq. (|5.2p ] . Since the set X is bounded the subgradients of fi over the set X 
are also bounded for each i € V; hence, the bounds Ci, i G V on subgradient norms 
exist. Thus, the conditions of Lemma 14.21 are satisfied. Further, by Assumption Q] 
the set X is contained in the interior of the domain of /, over which the function is 
continuous (by convexity; see [27]). Thus, the set X is compact and / is continuous 
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over X, implying that the optimal set X* is nonempty. Let x* G X* , and let y = x* 
in Lemma B~2"1 We have, for all k, 

rn rn 

lk*+i - *T < E Ik* - *T - 2 ^+i (/(») - /*) 

i=l i=l 

+2a fe+ i ( maxC, ) V" ||j/ fe - 

m m 

-2a fc+ i E <£ fc+ lk* - x *) + "fc+i ( C ' + ll e i,fe+lll) 2 ■ 
i=i i=i 

Since X is bounded, by using \\vi tk ~ x*\\ < max. XtVe .x \\x — y\\, taking the expectation 
and using the error bounds E[|Je ij fc + i|| 2 ] < vf we obtain 

rn m 

]T E[K*+i - z*|| 2 ] < £ E[||« i)fc - x*|| 2 ] - 2a fc+1 (E[/(y fc )] - /*) 

i=l i=l 

+2a fc+ i I maxCj I V" E[|jy fc - w jk \\] 
\ iev I — ' 

V / j=l 

rn m 

+2a k+1 max ||z - y\\ V ||E[e a+1 ]|| + a 2 k+1 V (Q + D t f .(5.3) 

By rearranging the terms and summing over k — 1, . . . , K, for an arbitrary K, we 
obtain 

2$> fc+1 [ (E[f(y k )} - /*) - (maxcA £ E[||y fc - 
fc=i \ \ 4 / j=1 

- max \\x-y\\ V ||E[e i)fc +i]H ~ mQfc+1 (max{C'i + ^} 

m m 

< y)E[Ki - ^ll 2 ] - V E[||^+i - ^|| 2 ] < m max ||x - y\\ 2 . 
i=l i=l 

Note that when afc+i — > a and a > 0, we have a k = oo. When a = 0, we have 
assumed that J2 k a k = oo. Therefore, by letting K — > oo, we have 



Vk - w Jjk \ 



- max ||.x- 2 ;||^l|Ek fc+ i]||-^±i J < /*. 

Using limsupfc^^ |j E^k+i] |j = [see Eq. (|5.2[) ] and limfc_ >00 at = a, we obtain 
liminf E[f(y k )] </* + — - ( max{C 4 + u t } ) + ( maxCj ] V" limsup E[\\y k - w jk \ 

m 

+ max ||x-y|| V^- 
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Next from the convexity inequality in (|2.2[) and the boundedness of the sub-gradients 
it follows that for all k and j G V, 

E[/K fc ) - f(y k )\ < (j2 E l\\Vk - w iM\\ , 

implying 

liminf E[f(wj. k )} < f* + —— [ max{C, + P,} J + ( maxC, ) limsup E[||y fc 

fc^oo ' 2 V J \ ieV J T~i k^oo 



, k — »-oo 
.7 = 1 

( m \ m 

Y] d limsup E[\\y k - Wj. k \\] + max ||x - y\\ V] /2 l 



By Theorem 15. 1[ we have for all j G V, 

limsup E[||j/fc — tOj,fe||] < am&x{Ci + i>i} [2 + 
k^oo «ev \ 1 - f) 

By using the preceding relation, we see that 

(\ 2 m 
max{C; + Pi} ) + max ||x - y\\ V" ^ 
iGV / i,j6A * — ' 

/ i—1 



ma ( maxCi ) max{C 4 - + v{\ ( 2 1 



^ ie y y i£:V ' \ 1-/3 

^^max^ +P,} (2+ T m6 



9 2m6»/3\ 



□ 



The network topology influences the error only through the term and can 
hence be used as a figure of merit for comparing different topologies. For a network 
that is strongly connected at every time, [i.e., Q = 1 in Assumption [2] and when r\ 
in Assumption [3] does not depend on the number m of agents, the term is of the 
order m 2 and the error bound scales as m 4 . 

We next show that stronger bounds can be obtained for a specific weighted time 
averages of the iterates Wi^k- In particular, we investigate the limiting behavior of 

{f(zit)}, where z it = ^^t ak + lW *> k _ Note that agent i can locally and recursively 
evaluate 2j,t+i from Zij and Wij+i- 

Theorem 5.4. Consider the weighted time averages Zjt — : Npr ak+lWj - k for j G V 



and t > 1 . Le£ £/ie conditions of Theorem 1 5. ,?l /10/rf. Then, we have for all j £ V, 



limsup E[/ (zj t)) < /* + max ||x — y\\ > fXi + wo; max{C; + z/,} 



9 2m6»/3 s 

z— 1 x / x ' 
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Proof. The relation in (|5.3p of Theorcm l5.3l is valid, and we have for any x* £ X* 

m m 

=1 

-2a k+1 I maxC, j X E [ll 2/fc - w^,fc||] 



i=l i=l 



+2a fe+ i max |jx - y|| V" || E[e 4ife+1 ] || + a| +1 V^Ci + 

t— 1 z— 1 



Pi) 2 . 



From the subgradient boundcdness and the subgradicnt inequality in (|2.2[) we have 
for any j, 

E[/(y fc )] - E[/K fc )] > - (X>J E tH^ ^ w ^ fc H] ^ - m ( ni e a y xCi ) E tll yfe - w ^ ■ 
Therefore, we obtain 

771 m 

XE[|k, fe+ i-x*|| 2 ] <X E [II^--^I| 2 ] -2« fc+ i (E[/K fc )]-D 



+2a fe+ i ^maxCij ^mE[||j/ fe - w,\fc||] + X E 



[Ibfe - u>i,k ||] 



+2a fe+ i max ||x- y\\ X ll E kfc+i] II + a t+i Y]( c i + v if- 

l—l 2—1 

By re-arranging these terms, summing over k = 1, . . . , t and dividing with 2 ^fe=i a k+i , 
we further obtain 

X ak ±\ E[fM] <r + ^ — XEOKt-^n 2 ] 

fc= i 2^=1^+1 2 2^fe=i«fc+i i=1 

+ X Qfe+1 ^ maXl6 ^ ^ ( mE t" yfc ~~ Wi ' k ^ + 5^^=i E HI !/fe ~~ ™'' fc II]) 



fc=l 



X)fc=l a k+l 



+ max x' 

x,yEX 



-y\\± ^r" Efefc+i1 " + al+i + *o 



1=1 



Next by the convexity of / note that 



t \ t 



From the preceding two relations we obtain 

1 m 

e[/( 2j ,o] < r + ^ X E [ii^.i - **ii 2 ] 

+ y-^ Qfc+i (maxigv Ci) (mE[||y fc - u>j,fc|[] + YliLi E 0l2/* ~ "'t,*! 



fc=i 

m V^ £ 1 1 ■— r 



+ max ilx-.llX ^r"^ 11 " + %r iak+1 + (5-4) 
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First note that in the limit as t — > oo, the second term in (|5 .4|) converges to since 
y~]u— i otk+i = oo • By using the results of Lemma 13.1b . for the remaining terms, we 
obtain 



limsup E[f(zj,t)\ < f* + ( max d I limsup mE[\\y k - Wj k \\] + y^ E[\\y k - w iyk \\) 
t^oo \>ev J y ~[ ' , 

m m 

+ max \\x - y\\ } limsup || E[e l]fc+ i] || + — Y^C; + D t ) 2 . 

2 = 1 ? = 1 

By Theorem 15. 11 we have for all j € V, 

2 + i 3 ) ' 

1 _ P J 

which when substituted in the preceding relation, yields 

limsup E[f(zj t)] < /* + 2ma ( maxCj ] max{Ci + Si} (2 H 

t->oo \i£V ) lev \ l — p 

m m 

+ max ||x - } limsup || E[e* t+i] II + 77 + ^) 2 

?— 1 z=l 

771 

</*+ max ||x - y\\ limsup || E[e iife+1 ] || 

2=1 

+ma^max{C t + , J }j ^ + _j . 

□ 

The error bounds in Theorems 15.31 and 15.41 have the same form, but they apply to 
different sequences of function evaluations. Furthermore, in Theorem 15.41 the bound 
is for all subsequences of E[/(z ij / c )] for each agent i. In contrast, in Theorem 15.31 the 
bound is only for a subsequence of E[/(z^fc)] for each agent i. Theorem 15.41 demon- 
strates that, due to the convexity of the objective function /, there is an advantage 
when agents are using the running averages of their iterates. 

When the erroJl moments || E[ei,fc+i] || converge to zero as k — > oo, and the stepsize 
converges to zero [a = 0] , Theorems 15.31 and 15.41 yield respectively 

liminf E[/K fe )] = /* and lim E[f( Zj , k )) = /*. 

k — >oo k — >oo 



When a constant stepsize a is used, the vector zjj is simply the running average 

l 
/ 



of all the iterates of agent j until time t, i.e., Zjj = \ Ylh—i w j-k- F° r this case, with 



5 When the moments ||E[e; ,k+i\ II are zero, it can be seen that the results of Theorems 15.31 and 
15.41 hold when the boundedncss of X is replaced by the weaker assumption that the subgradients of 
each fi are bounded over X. 
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zero mean errors, the relation in (|5.4[) reduces to 

E[/fe)]<r + ^E E DKi-^n 2 ] 

2 — 1 

+ (^f^ c ij 7E ^ mE tll^ - w i,k\\] + Yl^[\\yk - m,k\\]j 

rn 

This can be used to derive an estimate per iteration, as seen in the following. 

COROLLARY 5.5. Under the conditions of Theorem 1 5. 3\ with ||E[ej.fc+i] || = and 
a k = for all i and k, for the average sequences {<Zj,fc} we have for all t and j , 

(max { a + ^}) {-,+—)■ 



-ma 



Proof. Taking the expectation in the relation of Lemma 14.11 we obtain 



e [||2/a+i -Wj.k+i\\] <m^ ft+1 max||iyj i0 || +ma>9(3 I ma,x{C l + ^} I 



k 

k-e 



2amax|Ci + vA 

i<EV 



<m98 k+1 max \\w iQ \\ + a ( max-fCj + 1 ( 2 + - 
iev ' \iev J \ 1 



to0/3 



Combining the preceding relation with the inequality in (|5.5[) . and using ^fc=i P _• 
j^Tg, we obtain 

</* + 5 LgE[ Kl - a; -|r] + ^ (maxa) (max || ., ||) 

□ 

The preceding equation provides a bound on the algorithm's performance at each 
iteration. The bound can be used in obtaining stopping rules for the algorithm. For 
example, consider the error free case {pi = 0) and suppose that the goal is to determine 
the number of iterations required for agents to find a point in the e-optimal set, i.e., 
in the set X e = {x e X : f(x) < f* + e}. Minimizing the bound in Corollary 15.51 
over different stepsize values a, we can show that e-optimality can be achieved in 
N e = iterations with a stepsize a e = ; where ipe is the positive root of the 

quadratic equation 
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and A, B and C are 

C = m(»«{C, + P,}) (| + ^)- 

Since ij) e scales as yfe, we can conclude that N e scales as Equivalently, we can 
say that the level e of sub-optimality diminishes inversely with the square root of the 
number of iterations. 

6. Almost sure and mean square convergence. In this section, we impose 
some additional assumptions on the sub-gradient errors to obtain almost sure consensus 
among the agents and almost sure convergence of the iterates to an optimal solution of 
(|2.1[) . Towards this, define F k to be the c-algebra a (e,^; i £ V, < I < k) generated 
by the errors in the agent system up to time k. In other words, F k captures the 
history of the errors until the end of time k. We use the following assumption on the 
subgradient errors e^fc. 

Assumption 5. There are scalars Vi such that EUle^+ilj 2 | Fk] < ff for all k 
with probability 1. 

Note that Assumption [5] is stronger than Assumption |4j Furthermore, when the 
errors are independent across iterations and across agents, Assumption [5] reduces to 
Assumption |U 

We start by analyzing the agents' disagreements measured in terms of distances 
\\yk — Wj.k\\- We have the following result. 

Theorem 6.1. Let Assumptions]]}!, fJl [5| and[5| hold. Suppose that the subgradi- 
ents of each fi are uniformly bounded over X, i.e., for each i £ V there is Ci such 
that 

||V/i(x)|| < d for allx£ X. 
If SfeLo a 1+i < 00 > then with probability 1, 

oo 

^ ctk+2\\Vk+i - Wj,k+i\\ < oo for all j £ V. 
fc=l 

Furthermore, for all j £ V, we have limk—too ||j/fc+i — ^',fe+i|| = with probability 1 
and in mean square. 

Proof. By Lemma 14. II and the subgradient boundedness, we have for all j £ V , 

k m 

\\yk+i - w jlk+1 \\ <m6l3 k+1 max ll^.o || I^V^-'Va, (C l + ||e. t ,,||) 

i£ V £ — * L — * 

t=l i=l 

_^ m 

+ - V a k+1 {d + \\e ilk +i\\) + a k+1 {Cj + \\e jt k+i\\) ■ 
»=l 

Using the inequalities 

a k+2 a e {d + \\ei4) < - (a 2 k+2 + a\ (C\ + ||e M ||) 2 ) 
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and (Cj + \\€ %tl \\) 2 < 2Cf + 2||e M || 2 , wc obtain 

a fc+2 ||yfc + i - W].k+i\\ <ak+2 m0f3 k+1 max||iUio 



e=i i=i v 

1 /l 



+ 2 a fe+2 + a t+l ( C j + \\ e 3,k+lf) ■ 

By using the inequalities Yle=i P k+1 ~ i < jr^ f° r all fc > 1 and 5m + 5 — an d by 
grouping the terms accordingly, from the preceding relation we have 



Uk+2\\yk+i -w jt k+i\\ <afc+ 2 m6 | /3 fe+1 max||m^o|| + ( 1 + 



m6(3 



*k+2 



2(1-/3), 

k m 

+ eJ2al0 k+1 -*J2(C! + \\e i A 2 ) 
e=i i=i 

1 m 

+ E (c? + ll £ a+ill 2 ) + 4+i (CI + ll^+ill 2 ) • 

i=l 

Taking the conditional expectation and using E[||ej^|| 2 | -Ff-i] < tf, and then taking 
the expectation again, we obtain 



E[a k +2\\yk+i - ttfj,k+i||] <a k+2 m9(3 + max ||iu ii0 || + 1 



m9(3 
2M) 



*fc+2 



\i=l / £=1 

+ E (c? + + + v D ■ 

i=l 

Since J2k a 1 < 00 ( an d hence bounded), the first two terms and the last two 

terms are summable. Furthermore, in view of Lemma 13. II [part (b)], we have 

00 k 

EE/^-'a^oo. 

fe=l t=\ 

Thus, the third term is also summable. Hence Y] k —1 E[afc + 2||?/fc+i — ifj.fc+i || ] < 00. 
From the monotone convergence theorem [4] , it follows that 



E afc + 2 lly fe + 1 ~ W l.k+l\\ 



Lk=l 



= E E l a k+2\\Vk+l - Wj,k+l\\] , 



k=l 



and it is hence finite for all j. If the expected value of a random variable is finite, 
then the variable has to be finite with probability 1; thus, with probability 1, 



E a k+2\\yk+i - Wj.k+iW < 00 for all j E V. 



(6.1) 



fc=i 
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We now show that lim^oo || j/^ — w j,k\\ — with probability 1 for all j € V. Note 
that the conditions of Theorem 15.11 are satisfied with = i/j and a = 0. Therefore, 
\\yk — Wj^W converges to in the mean and from (Fatou's) Lemma 13.31 it follows that 



< E 



liminf \\y k - w jtk \\ 

k — »-oo 



< liminf E[\\y k - Wj. k \\] = 0, 

k — >oo 



and hence E[liminffc_ +00 \\y k — Wj tk \\] = 0. Therefore, with probability 1, 

Hminf ||2/ fc — = 0. (6.2) 

To complete the proof, in view of (|6.2p it suffices to show that \\y k — Wj t k\\ con- 
verges with probability 1. To show this, we define 

tn 

n,k+i = y^at,j(fc + l)wj(k) - a k+1 (V/, (v it k) + e*,k+x) , 
j=l 

and note that Px[n,k+i] = Wi,k+i [sec (|2.3p and (|2Ti)) ]. Since = ^ S2=i and 
the set X is convex, it follows that y k € X for all fc. Therefore, by the non-expansive 
property of the Euclidean projection in (|3.4|) . we have Hu^fc+i — j/fe|| 2 < Ur^fc+i — y k \\ 2 
for all i & V and all fc. Summing these relations over all i, we obtain 

rn m 

||w t . fe+ i - yfc!| 2 < \\n,k+i - J/fc|| 2 for all k. 

i=l i=l 

From ?/fc+i = — X)2=i ^i.fc+i an d the fact that the average of vectors minimizes the 
sum of distances between each vector and arbitrary vector in 3?" [cf. Eq (|3.1[) ]. we 
further obtain 

TCI TCI 

\\ w i-k+l - Vk+lW 2 < ^ \\ W i,k+l - VkW 2 - 



Therefore, for all k. 



Y IK,fc+i - Vk+i\\ 2 < IK*H-i ~ Vkf- (6.3) 



We next relate Y^T=i ll r i,fe+i — VkW 2 to Y^iLi \\ w i,k — 2/fc|| 2 - From the definition of 
r^fc+i and the equality Xw=i a i,j(k + 1) = 1 Assumption [3Jd] , we have 

m 

n.k+i - Vk =/,a>i,j(k + 1) (wj. k - y k ) - a fc+ i (V 7*(v*,k) + e»,fc+i) 
j'=i 

By Assumption [3^, and [3)3, we have that the weights a,ij(k + € V yield a con- 
vex combination. Thus, by the convexity of the norm [ (|3.2[) and (|3.3|) ] and by the 
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subgradicnt boundcdncss, we have 

m 

IKfc+i - Vk\\ 2 <Y^ a i,j( k + !) \\ w J,k - Vkf + a 2 k+1 \\Vfi(v it k) + ei,k+i\\ 
3=1 

m 

+ 2a k+1 \\Vfi(vi !k ) + e,,fe + i|| E <k,j(k + 1) \\w jtk - Vk\\ 

3=1 

m 

< E a «( fc + !) IK* - y»W 2 + 2a '+i ( c i + ll e ^-+ill 2 ) 

3=1 

n i 

+ 2a k+1 (Q + ||e ijfc+ i||) E a *j( fc + 1) IK.fc - Vk\\ ■ 

3=1 

Summing over all i and using Y^iLi a i,j(. k + 1) = 1 [ c f- Assumption [3ji] , we obtain 

mm m 

E - ^'H 2 ^E IK* - ffcll 3 + 2a? +1 E (C? + Iki.fe+iH 2 ) 

1— 1 _7=1 ?— 1 

m m 

+ 2a k+ i E + !l e i,fe+i|l)E a '--''( fc + ^ \\ w j,k ~ yfe H ' 

»=1 3=1 

Using this in 3|) and taking the conditional expectation, we see that for all k, we 
have with probability 1, 

m mm 

EE[|K, fc+1 - y k+ i\\ 2 | F k ] <E IK* - Vkf + 2a 2 k+1 E (C? + ^ 2 ) 

2— 1 2—1 i— 1 

m m 

+ 2a fe+1 E (Ci + E IK* - Wll > ( 6 ' 4 ) 

»=i j=i 

where we use ajj(A; + l) < 1 for all i,j and fc, and the relations E [ 1 1 ,fc+i|| 2 | F/J < v\, 
E[||ei i fc+i|| | Fk] < Vi holding with probability 1. 

We now apply Theorem l3.4l to the relation in (|6.4p . To verify that the conditions of 
Theorem [32] are satisfied, note that the stepsize satisfies X)fc°=i a l+i < 00 f° r au i&V. 
We also have J^feli^fc+i \\ w j-.k — J/fej < oo with probability 1 [cf. (|6.ip ]. Therefore, 
the relation in (|6.4[) satisfies the conditions of Theorem 13.41 with ( k = D k = 0, thus 
implying that \\wj^ k — y k \\ converges with probability 1 for every j G V. □ 

Let us compare Theorem 16.11 and Corollary 15.21 Corollary 15.21 provided suffi- 
cient conditions for the different agents to have consensus in the mean. Theorem 16. II 
strengthens this to consensus with probability 1 and in mean square sense, for a 
smaller class of stepsize sequences under a stricter assumption. 

We next show that the consensus vector is actually in the optimal set, provided 
that the optimal set is nonempty and the conditional expectations || E[e^ 5 fc_(_i | F k ] \\ 
are diminishing. 

Theorem 6.2. Let Assumptions^^ [5| and[5| hold. Suppose that the subgradients 
of each fi are uniformly bounded over X , i.e., for each i G V there is Ci such that 

||V/i(x)|| < C t for allx G X. 

Also, assume that X^fcLo ll^[ e i,fc+i I F k ] || 2 < oo for all i G V. Further, let the stepsize 
sequence {a k } be such that Ok = oo and Y^k=i a k < 00 • Then, if the optimal 
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set X* is nonempty, the iterate sequence {w^} of each agent i € V converges to the 
same optimal point with probability 1 and in mean square. 

Proof. Observe that the conditions of Lemma 14.21 arc satisfied. Letting z = x* 
for some x* € X* , taking conditional expectations and using the bounds on the error 
moments, we obtain for any x* € X* and any k, with probability 1, 

m m 

£ E[|kfc+ X - x*|| 2 \F k ] \\vi, k x*f - 2a k+1 (f(y k ) /*) 

i=l t=l 

+2a k +i maxCi ) S~] \\y k - w j k \\ 

V 7 3=1 

m m 

+2a k +i^ Hi,k+i\\vi.k ~ x*\\ + a 2 k+l ^ (C< + v tf . 

i=l i=l 

where /* = f(x*), and we use the notation /i^fc+i = IjE^^fc+i | -Ffe] ||. Using the 
inequality 

2a k+1 (ii, k+ i\\vi, k - x*\\ < a 2 k+1 \\v. hk - x*\\ 2 + n\ k+1 , 
we obtain with probability 1, 



x*\\ 2 



J2 E[K*+i - x*\\ 2 \F k ]<J2{l + \Kk x 

i=l i=l 

- 2a fe+ i [{f{yk) - /*) - ^maxC^ ^ - to,- )fe || 

m -y 771 \ 

2 afc+l ^ (Ci + ^ )2 ■ (6 - 5) 

i=l {=1 / 

By Theorem 16. 11 we have with probability 1, 

y^qfc+illw^fc - y k \\ < oo. 

Further, since ^2 k ^ 2 k < oo and X)fc a fc < 00 with probability 1, the relation in 



satisfies the conditions of Theorem 13.41 We therefore have 

5>fc(/(ite) - /*) < oo, (6.6) 

and \\vi jk — x*\\ converges with probability 1 and in mean square. In addition, by 
Theorem 16.11 we have lim/^oo \\wi tk — y k \\ = for all i, with probability 1. Hence, 
linifc^oo \\vi jk — Vk\\ — > for all i, with probability 1. Therefore, \\y k — x*\\ converges 
with probability 1 for any x* G X*. Moreover, from (|6.6p and the fact that J2 k a k 
x . by continuity of /, it follows that y k , and hence w^ k , must converge to a vector 
in X* with probability 1 and in mean square. □ 

Note that the result of Theorem 16.21 holds without assuming compactness of the 
constraint set X. This was possible due to the assumption that both the stepsizc 
a k and the norms ||E[ei,fe+i | F k ] || of the conditional errors are square summable. 
In addition, note that the result of Theorem 16.21 remains valid when the condition 
J2T=o l|E[ei,fe+i | F k ] || 2 < oo for all i is replaced with Y. k Lo a k+i\\E[ti,k+i | Fk] II < oo 
for all i. 
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7. Implications. The primary source of stochastic errors in the subgradicnt 
evaluation is when the objective function is not completely known and has some 
randomness in it. Such settings arise in sensor network applications that involve 
distributed and recursive estimation [24]. 

Let the function fi(x) be given by fi(x) — E[gi(x, Ri)] , where Ri is a random 
variable whose statistics are independent of x. The statistics of Ri are not available 
to agent i and hence the function fi is not known to agent i. Instead, agent i ob- 
serves samples of Ri in time. Thus, in a subgradient algorithm for minimizing the 
function, the subgradient must be suitably approximated using the observed samples. 
In the Robbins-Monro stochastic approximation [22], the subgradient V/j(x) is ap- 
proximated by \7gi(x, r^), where n denotes a sample of Ri. The associated distributed 
Robbins-Monro stochastic optimization algorithm is 

w it k+i = Px [Vi,k - a k+1 Vgi (v i)k , n t k+i)] , (7.1) 
where ri.k+i is a sample of Ri obtained at time k. The expression for the error is 

e-s.fc+i = ^gi(viM,n. k +i) - E[Vgi(ui, fc ,i?i)] . 

If the samples obtained across iterations are independent then 

E[ei, fc+ i | F k ] = E[e,,fc+i | v iik ] = 0. 

If in addition, Var[Vg,(x, Ri)] is bounded for all x £ X then the conditions of Theo- 
rems 15.31 15.41 and 16.21 are satisfied. 

Let us next consider the case when fi(x) = E[gi(x, Ri(x))] , where Ri(x) is a 
random variable that is parameterized by a;. To keep the discussion simple, let us 
assume that x £ 3£. As in the preceding case, the statistics of Ri(x) are not known 
to agent i, but the agent can obtain samples of Ri(x) for any value of x. In the 
Kiefer-Wolfowitz approximation [22], 

rjtt \ 9i(x,n(x + (3)) - gi(x,ri(x)) 
V/,(x) « , 

where ri(x) is a sample of the random variable Ri(x). The corresponding distributed 
optimization algorithm is 



9i (vi,k,n(vi,k + A,fc+i)) - 9i (vi,k,ri(Vi k)) 

Vi.k - OL k+ i ■ 



where A.fc+i is a positive scalar. In this case, the error is 

9i(vi,k,ri(vi t k+J3i,k+i))—9i(vi,k,ri(vi,k)) N 

e i,/c+l = o v}i{Vi, k ). 

Pi,k+1 

If the function gi is diffcrcntiable then Efe^fc+i | Vi tk ] is of the order Pi^+i- Thus, the 
conditions on the mean value of the errors can be controlled through the sequence 
{Pi,k} and the conditions in Theorems 15.31 HT^l and 16 . 2l can be met by suitably choosing 
the sequence {/?i,/c}- 
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8. Discussion. Wc studied the effects of stochastic subgradient errors on dis- 
tributed algorithm for network of agents with time-varying connectivity. We first 
considered very general errors with bounded second moments and obtained explicit 
bounds on the agent disagreements and on the expected deviation of the limiting 
function value from the optimal. The bounds are explicitly given as a function of 
the network properties, objective function and the error moments. For networks that 
are connected at all times and ry is independent of the size of the network, the bound 
scales as a (maxjgy{Ci + z^}) 2 m 4 , where m is the number of agents in the network, a 
is the stepsize limit, and C 2 : and v"l are respectively the subgradient norm bound and 
the bound on the second moment of the subgradient errors for agent i. For the con- 
stant stepsize case, we obtained a bound on the performance of the algorithm after a 
finite number of iterations. There, we showed that deviation from the "error-bound" 
diminishes at rate j, where t is the number of iterations. Finally, we proved that 
when the expected error and the stepsize converge to sufficiently fast, the agents 
reach a consensus and the iterate sequences of agents converge to a common optimal 
point with probability 1 and in mean square. 

We make the following remarks. First, it can be shown that the disagreement 
results in Corollary 15 . 21 and Theorem 16.11 hold even when the agents use non-identical 
stepsizes. However, with non-identical agent stepsizes there is no guarantee that the 
sum of the objectives rather than a weighted sum, is minimized. 

Future work includes several important extensions of the distributed model stud- 
ied here. At first, we have assumed no communication delays between the agents and 
synchronous processing. An important extension is to consider the properties of the 
algorithm in asynchronous networks with communication delays, as in [30]. At sec- 
ond, we assumed perfect communication scenario, i.e., noiseless communication links. 
In wireless network applications, the links are typically noisy and this has to be taken 
into consideration. At third, we have considered the class of convex functions. This 
restricts the number of possible applications for the algorithm. Further research is to 
develop distributed algorithms when the functions /; are not convex. 
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